[bugfix] fix deepseek rope sincoscache re-generation by zzzzwwjj · Pull Request #2744 · vllm-project/vllm-ascend

zzzzwwjj · 2025-09-04T07:17:06Z

What this PR does / why we need it?

The current implementation will result in duplicate generation of sin_cos_cache in rope when kv_seqlen > 4k, because the initialization length of the sin_cos_cache is only 4k.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

After this PR merged, sin_cos_cache will not increase in forward func, so test_native_rope_deepseek_forward_cache_handling is not necessary.

vLLM version: v0.10.1.1
vLLM main: vllm-project/vllm@60f0843

gemini-code-assist

Code Review

This pull request correctly addresses a bug in the DeepSeek RoPE implementation where the sincoscache was being regenerated unnecessarily for sequences longer than the initial max_position_embeddings. The fix involves pre-allocating a larger cache during initialization by incorporating the scaling_factor, and removing the dynamic resizing logic from the forward pass. These changes are consistently applied across both the standard and torchair implementations, and the corresponding obsolete tests are correctly removed. My review identifies a minor but important point for robustness: the calculation of max_seq_len results in a float, which could lead to implicit behavior dependencies. I've suggested using math.ceil to ensure it's an integer, which improves clarity and aligns with the expectations of parent classes.

gemini-code-assist · 2025-09-04T07:19:21Z

The scaling_factor is a float, which results in self.max_seq_len being a float. While torch.arange can handle a float end value, it's safer and clearer to use an integer for a value representing a sequence length. This avoids reliance on the implicit behavior of torch.arange with floats and improves code robustness. The parent class RotaryEmbedding also expects an int for the sequence length in its _set_cos_sin_cache method signature. Using math.ceil will ensure the allocated cache is large enough and the length is an integer.

Suggested change

self.max_seq_len = max_position_embeddings * scaling_factor

self.max_seq_len = math.ceil(max_position_embeddings * scaling_factor)

gemini-code-assist · 2025-09-04T07:19:21Z

The scaling_factor is a float, making self.max_seq_len a float. It is better practice to use an integer for sequence lengths to avoid relying on the implicit behavior of torch.arange with float inputs and to make the code more explicit and robust. The corresponding method in the parent RotaryEmbedding class also expects an integer. Using math.ceil ensures the length is an integer and the cache size is sufficient.

Suggested change

self.max_seq_len = max_position_embeddings * scaling_factor

self.max_seq_len = math.ceil(max_position_embeddings * scaling_factor)

github-actions · 2025-09-04T07:29:15Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

codecov · 2025-09-04T09:20:36Z

Codecov Report

❌ Patch coverage is 91.66667% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.78%. Comparing base (24d4dad) to head (45dfe83).
⚠️ Report is 728 commits behind head on main.

Files with missing lines	Patch %	Lines
...m_ascend/torchair/ops/torchair_rotary_embedding.py	50.00%	2 Missing ⚠️
.../ut/torchair/ops/test_torchair_rotary_embedding.py	93.75%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2744      +/-   ##
==========================================
- Coverage   73.71%   72.78%   -0.94%     
==========================================
  Files         152      154       +2     
  Lines       21967    21313     -654     
==========================================
- Hits        16194    15513     -681     
- Misses       5773     5800      +27

Flag	Coverage Δ
unittests	`72.78% <91.66%> (-0.94%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

wangxiyuan · 2025-09-04T10:13:47Z

this is a cherry-pick from #1551

Signed-off-by: zzzzwwjj <1183291235@qq.com>

### What this PR does / why we need it? The current implementation will result in duplicate generation of `sin_cos_cache` in rope when `kv_seqlen` > 4k, because the initialization length of the `sin_cos_cache` is only 4k. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? After this PR merged, sin_cos_cache will not increase in forward func, so `test_native_rope_deepseek_forward_cache_handling` is not necessary. - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@60f0843 Signed-off-by: zzzzwwjj <1183291235@qq.com> Signed-off-by: 1Fire4 <wangdingyi2@huawei.com>

### What this PR does / why we need it? The current implementation will result in duplicate generation of `sin_cos_cache` in rope when `kv_seqlen` > 4k, because the initialization length of the `sin_cos_cache` is only 4k. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? After this PR merged, sin_cos_cache will not increase in forward func, so `test_native_rope_deepseek_forward_cache_handling` is not necessary. - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@60f0843 Signed-off-by: zzzzwwjj <1183291235@qq.com>

### What this PR does / why we need it? The current implementation will result in duplicate generation of `sin_cos_cache` in rope when `kv_seqlen` > 4k, because the initialization length of the `sin_cos_cache` is only 4k. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? After this PR merged, sin_cos_cache will not increase in forward func, so `test_native_rope_deepseek_forward_cache_handling` is not necessary. - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@60f0843 Signed-off-by: zzzzwwjj <1183291235@qq.com> Signed-off-by: offline0806 <z00858301@china.huawei.com>

### What this PR does / why we need it? The current implementation will result in duplicate generation of `sin_cos_cache` in rope when `kv_seqlen` > 4k, because the initialization length of the `sin_cos_cache` is only 4k. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? After this PR merged, sin_cos_cache will not increase in forward func, so `test_native_rope_deepseek_forward_cache_handling` is not necessary. - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@60f0843 Signed-off-by: zzzzwwjj <1183291235@qq.com>

### What this PR does / why we need it? The current implementation will result in duplicate generation of `sin_cos_cache` in rope when `kv_seqlen` > 4k, because the initialization length of the `sin_cos_cache` is only 4k. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? After this PR merged, sin_cos_cache will not increase in forward func, so `test_native_rope_deepseek_forward_cache_handling` is not necessary. - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@60f0843 Signed-off-by: zzzzwwjj <1183291235@qq.com> Signed-off-by: nsdie <yeyifan@huawei.com>

### What this PR does / why we need it? The current implementation will result in duplicate generation of `sin_cos_cache` in rope when `kv_seqlen` > 4k, because the initialization length of the `sin_cos_cache` is only 4k. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? After this PR merged, sin_cos_cache will not increase in forward func, so `test_native_rope_deepseek_forward_cache_handling` is not necessary. - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@60f0843 Signed-off-by: zzzzwwjj <1183291235@qq.com>

gemini-code-assist Bot reviewed Sep 4, 2025

View reviewed changes

zzzzwwjj force-pushed the main branch from 96b203c to d4afbdc Compare September 4, 2025 07:35

github-actions Bot added module:tests module:ops labels Sep 4, 2025

wangxiyuan approved these changes Sep 4, 2025

View reviewed changes

ApsarasX approved these changes Sep 4, 2025

View reviewed changes

zzzzwwjj force-pushed the main branch 4 times, most recently from f1cdcde to 6b4f8f0 Compare September 8, 2025 03:53

[bugfix] fix deepseek rope sincoscache re-generation

45dfe83

Signed-off-by: zzzzwwjj <1183291235@qq.com>

zzzzwwjj force-pushed the main branch from 6b4f8f0 to 45dfe83 Compare September 8, 2025 06:15

MengqingCao merged commit 4df8df5 into vllm-project:main Sep 8, 2025
31 of 32 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bugfix] fix deepseek rope sincoscache re-generation#2744

[bugfix] fix deepseek rope sincoscache re-generation#2744
MengqingCao merged 1 commit intovllm-project:mainfrom
zzzzwwjj:main

zzzzwwjj commented Sep 4, 2025 •

edited by github-actions Bot

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Sep 4, 2025

Uh oh!

gemini-code-assist Bot Sep 4, 2025

Uh oh!

github-actions Bot commented Sep 4, 2025

Uh oh!

codecov Bot commented Sep 4, 2025 •

edited

Loading

Uh oh!

wangxiyuan commented Sep 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	self.max_seq_len = max_position_embeddings * scaling_factor
	self.max_seq_len = math.ceil(max_position_embeddings * scaling_factor)

Conversation

zzzzwwjj commented Sep 4, 2025 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Sep 4, 2025

Uh oh!

codecov Bot commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

wangxiyuan commented Sep 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zzzzwwjj commented Sep 4, 2025 •

edited by github-actions Bot

Loading

codecov Bot commented Sep 4, 2025 •

edited

Loading